home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Shareware Grab Bag
/
Shareware Grab Bag.iso
/
007
/
grepsmc.arc
/
READ.ME
< prev
next >
Wrap
Text File
|
1986-11-30
|
6KB
|
119 lines
GREPSMC.LBR
The chief content of this library is a translation of a public domain
version of the Unix (TM) utility grep into Small C a la Hendrix. This
implementation allows most of the standard control constructs (if, else,
while, do ... while, for, switch, goto, but not expr1 ? expr2 : expr3 ),
only int and char variables plus 1 level of indirection and a single
subscript (char *c; and char d[10]; are allowed, but not char **c; char *d[10];
or char e[10,3];).
Grep is a program primarily desiged for printing out lines in files
matching (containing) a specified "regular expression". A particular case of
a regular expression is just a fully specified string of characters such
as "#define". Thus
grep #define grepsmc.c
will list all lines in grepsmc.c containing "#define". But regular
expressions can specify more complicated patterns using "meta-characters"
such as '^', '$', '.', '*', '+', '-', '\' and '[...]'. For example '^' and
'$' match the beginning and the end of a line, respectively. Hence
grep ^#define grepsmc.c
matches "#define" only if it starts in column 1, and
grep ^$ grepsmc.c
matches only lines that are empty. The meta-character '.' matches any character
except the end of line. Thus "^..........$" matches only lines containing
exactly 10 characters (counting blanks and tabs), and "h..d" matches strings
"head", "heed", "hold", "hard", etc. '*' matches 0 or more repetitions of the
preceding character matched. Thus "a.*e.*i.*o.*u" matches any line containing
the five vowels in alphabetical order. '+' matches 1 or more repetitions of
the preceding character matched. Thus "a.+e.+i.+o+.u" matches lines containing
the five vowels in order separated by at least one character. A bracket pair
"[ ... ]" matches any of the symbols between the brackets. Thus "[bc]a[nt]"
matches any line containing "ban", "bat", "can", or "cat". '-' can be used
within brackets to indicate a range of characters. Thus "[A-Za-z]" matches
any upper or lower case letter, "[A-Za-z][A-Za-z0-9]*" matches any string
starting with a letter and continuing with one or more letters or digits,
e.g., any C identifier. The meta-character '\' is used as a quoting character
or escape character to specify symbols that otherwise have special meaning
such as '[' or '*'. Thus "\[ *[0-9]+ *\]" matches any single C subscript
that is specified numerically, possibly surrounded with blanks (e.g., "[ 33]",
"[ 4 ]", or "[7]", but not "[ i ]").
There is a problem in CP/M (TM) in that lower case letters in the
command line are translated to upper case and blanks delimit arguments. Hence
in this version I have adopted the convention that any letter in a regular
expression is to be considered to be lower case unless it is immediately
preceded by '\' when it is always considered uppercase. Moreover blanks and
tabs are coded as '_' and '`', respectively. Thus both "[\tt]he" and
"[\TT]HE" match lines containing either "The" or "the". And to locate all
blank lines (i.e., all lines that are either empty or contain only blanks or
tabs) one could use the pattern "^[_`]*$" . To match actual '_' or '`',
use '\_' or '\~'. Note that some of the examples in the preceding paragraph
need to be modified to work with the CP/M version. Thus to match an
identifier use "[\a-\za-z][\a-\za-z0-9]+" or "[\A-\ZA-Z][\A-\ZA-Z0-9]+".
Usage of grep
The general form of an invocation of grep is as follows ([ ... ]
signifies an optional component):
grep [ -Flags ] RegularExpression FileList [ > OutputFile ]
or
grep [ -Flags ] RegularExpression [ < InputFile ] [ > OutputFile ]
where Flags is a sequence of letters from 'ncfhv', RegularExpression is a
pattern as described above, FileList is a list of files to be scanned, and
OutputFile is the optional file on which the output will be put. In the
second form, InputFile is a single file to be scanned. If both FileList and
"> InputFile" are omitted, grep will expect its input from the keyboard,
terminated by ^Z (CTRL Z). This is a useful way to experiment with what a
particular pattern matches. If more than one file is scanned, the file name
is printed with each line matched unless the f flag is used (see below).
The meaning of the flags is as follows:
n print line numbers of lines matched.
f reverse default for printing file name, i.e., print name if
only 1 file is scanned, omit if more than 1 is scanned.
v print only lines that do not match.
c print only the total number of lines matched (or not matched
if v is specified).
h print help information (some additional meta-characters are
described)
Other Files
Also included are the files used to create GREP.COM (with the exception
of the compiler itself). The version of the compiler I have produces assembler
code suitable for ASM. To avoid having to reassemble the I/O and system-
related functions, I have created HEX files IOLBCALL.HEX and LIBASM.HEX,
together with header files, STDIOCB.H and LIBASM.H that provide EQU's for the
entries in the HEX files. IOLBCALL.HEX contains most of the standard C I/O
functions (getc, getchar, fgets, putc, fopen, fclose, etc.; see STDIOCB.H for
others included), as well as the run-time routines needed by the compiled
code. LIBASM.HEX contains printf and fprintf (recognize %c, %s, %d, %x) and
supporting routines. They were compiled from a library copyrighted by Jim
Hendrix, modified so as to provide fprintf. Also included is CATLOAD.COM,
a program allowing creation of a COM file from several HEX files. Usage is
catload file1.hex [file2.hex ... ] comfile.com
Catload can produce a COM file up to about 30K. With SMC.COM (the compiler),
CATLOAD.COM, STDIOCB.H, LIBASM.H, IOLBCALL.HEX, and LIBASM.HEX on drive a:,
the sequence I used to create GREP.COM was
A>SMC B:GREPSMC.C > B:GREPSMC.ASM
A>ASM GREPSMC.BBZ
A>CATLOAD IOLBCALL.HEX LIBASM.HEX B:GREPSMC.HEX B:GREP.COM
Christopher Bingham
792 Osceola Avenue
St. Paul, MN 55105
July 25, 1986.